1.1 Corpus Description¶
This study compares Reddit discussions from Virginia Tech and James Madison University to analyze how students at each institution engage with their surrounding communities. Our hypothesis predicted that because most Virginia college students come from similar areas, Virginia Tech students would frequently reference other in state universities in their online conversations. However, our exploratory data analysis using geosparser and Voyant revealed that Virginia Tech students discuss the surrounding area of Blacksburg, while James Madison students are less focused on Harrisonburg. Spatial analysis further showed that Virginia Tech’s online discourse is more locally centered, suggesting stronger connections with the community. Sentiment mapping supported this finding, revealing more positive emotions tied to local spaces in Blacksburg than in Harrisonburg, where the sentiment was more focused on the JMU campus. Overall, our findings suggest that Virginia Tech students are more actively engaged with their town community, possibly due to the university’s bar culture and more local activities that connect the community to students.
1.2 Hypothesis¶
Virginia tech is more community focused and therefore will have more positive sentiments around Blacksburg, than James Madison will have around Harrisonburg.
Part 2: Exploratory Data Analysis¶
📊 Action Items¶
📋 Data Preparation
- Use the zip file in your group directory (
group_data_packets/voyant/INSTITUTION_JMU.zip) to run Voyant analysis at https://www.voyant-tools.org
✍️ Writing Task¶
Write a brief overview of the type of evidence you are hoping to find with your Voyant analysis.
💡 Example: "We will compare the difference in 'school spirit' between UNC and JMU by looking at terms related to sports, social events, and mascots in the trends tool and the context tool."
2.1 Visualization 1 Relative Frequencies Harrisonburg vs Blacksburg¶
This visualization shows relative frequencies and that Virginia Tech students talk about Blacksburg more than James Madison students talk about Harrisonburg. This might help our research because this helps prove that Virginia Tech students are more involved and community-based.
Visualization Analysis¶
This visualization confirms our hypothesis because Virginia Tech students talk about Blacksburg and have a bigger relative frequency to Blacksburg than James Madison students do to Harrisonburg.
2.2 Visualization 2 Relative Frequencies JMU vs Virginia Tech¶
This visualization also shows relative frequencies and that James Madison University students talk more about JMU than Virginia Tech students talk about VT. This could help our research because it shows that JMU students talk more about JMU and what goes on inside the university rather than what goes on in Harrisonburg. Virginia Tech students talk more about Blacksburg and what’s going on in their town rather than what’s going on with just Virginia Tech itself.
Visualization Analysis¶
This visualization confirms our hypothesis because it shows that the students of Virginia Tech don’t talk as much about just Virginia Tech but about Blacksburg as a whole, while James Madison students talk more about what is going on in just JMU and not as much about Harrisonburg.
Part 3: Data Cleaning and Refinement¶
# =============================================================================
# SETUP: Import Libraries and Load Data
# =============================================================================
# This cell sets up all the tools we need for spatial sentiment analysis
# Force reload to pick up any changes to data_cleaning_utils
# This ensures we get the latest version of our custom functions
import importlib
import sys
if 'data_cleaning_utils' in sys.modules:
importlib.reload(sys.modules['data_cleaning_utils'])
# Core data analysis library - like Excel but for Python
import pandas as pd
# Import our custom functions for cleaning and analyzing location data
from data_cleaning_utils import (
clean_institution_dataframe, # Standardizes and cleans location data
get_data_type_summary, # Shows what types of data we have
get_null_value_summary, # Identifies missing data
create_location_counts, # Counts how often places are mentioned
create_location_sentiment, # Calculates average emotions by location
create_time_animation_data, # Prepares data for animated time series
)
# Interactive plotting library - creates maps and charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as pyo
# =============================================================================
# CONFIGURE PLOTLY FOR HTML EXPORT
# =============================================================================
# Configure Plotly for optimal HTML export compatibility
# Method 1: Set renderer for HTML export (use 'notebook' for Jupyter environments)
pio.renderers.default = "notebook"
# Method 2: Configure Plotly for offline use (embeds JavaScript in HTML)
pyo.init_notebook_mode(connected=False) # False = fully offline, no external dependencies
# Method 3: Set template for clean HTML appearance
pio.templates.default = "plotly_white"
# Method 4: Configure Plotly to include plotly.js in HTML exports
import plotly
plotly.offline.init_notebook_mode(connected=False)
# Load the cleaned JMU Reddit data (already processed and ready to use)
# This contains: posts, locations, coordinates, sentiment scores, and dates
df_jmu = pd.read_pickle("assets/data/jmu_reddit_geoparsed_clean.pickle")
# =============================================================================
# LOAD YOUR INSTITUTION'S DATA
# =============================================================================
# Replace the group number and institution name with your assigned data
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number (e.g., "group_1", "group_2", etc.)
# Replace "UNC_processed.csv" with your institution's file name
df_institution = pd.read_csv("group_data_packets/group_5/python/VirginiaTech_processed_clean.csv")
# =============================================================================
# CREATE RAW LOCATION MAP (Before Cleaning)
# =============================================================================
# This shows the "messy" data before we fix location errors
# You'll see why data cleaning is essential!
# STEP 1: Count how many times each place is mentioned
# Group identical place names together and count occurrences
place_counts = df_institution.groupby('place').agg({
'place': 'count', # Count how many times each place appears
'latitude': 'first', # Take the first latitude coordinate for each place
'longitude': 'first', # Take the first longitude coordinate for each place
'place_type': 'first' # Take the first place type classification
}).rename(columns={'place': 'count'}) # Rename the count column for clarity
# STEP 2: Prepare data for mapping
# Reset index makes 'place' a regular column instead of an index
place_counts = place_counts.reset_index()
# Remove any places that don't have valid coordinates (latitude/longitude)
# This prevents errors when trying to plot points on the map
place_counts = place_counts.dropna(subset=['latitude', 'longitude'])
# STEP 3: Create interactive scatter map
# Each dot represents a place, size = how often it's mentioned
fig = px.scatter_map(
place_counts, # Our prepared data
lat='latitude', # Y-coordinate (north-south position)
lon='longitude', # X-coordinate (east-west position)
size='count', # Bigger dots = more mentions
hover_name='place', # Show place name when hovering
hover_data={ # Additional info in hover tooltip
'count': True, # Show mention count
'place_type': True, # Show what type of place it is
'latitude': ':.4f', # Show coordinates with 4 decimal places
'longitude': ':.4f'
},
size_max=30, # Maximum dot size on map
zoom=3, # How zoomed in the map starts (higher = closer)
title='Raw Location Data: Places Mentioned in Virginia Tech Reddit Posts',
center=dict(lat=37.2, lon=-80.4) # Center map on North Carolina for UNC
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-darkmatter", # Clean, light map style
width=800, # Map width in pixels
height=600, # Map height in pixels
title_font_size=16, # Title text size
title_x=0.5 # Center the title
)
# Configure for HTML export compatibility
fig.show(config={'displayModeBar': True, 'displaylogo': False})
3.1 Toponym Misalignment Analysis¶
One of the major toponym misalignments that was important to our analysis regarded posts about Virginia Tech when the university was referred to as VT. Along with “Tech”, another one of Virginia Tech’s most common nicknames is the letters VT. Most students in the state of Virginia, regardless of which university they attend, will recognize the two letters as the university. The geosrparser, however, assigned the coordinates of the state of Vermont to posts referring to Virginia Tech as VT. Now, this mistake does make sense, due to the fact that the geosparser wouldn’t know of this nickname and Vermont's more commonly known abbreviation being VT.
3.3 Revised Map¶
Some places of note on campus that we looked into were the city of Blacksburg and Virginia Tech as a university. What was interesting was when Virginia Tech was marked as a location, the post talked about the university as an entity rather than a place or experience. This commentary came into posts talking about the city of Blacksburg, and while they were still referring to the Virginia Tech campus and popular student locations, the posts referred to the location as Blacksburg. Another thing that was surprising was how frequently posts talked about the Interstates (I-80) and State Routes (Route 460) that lead into Blacksburg. This wasn’t a pattern we saw on the JMU corpus. This discrepancy between the two and the popularity of Blacksburg as a location further supports the idea that Virginia Tech students see Virginia Tech as a part of Blacksburg, while JMU students see Harrisonburg as a part of JMU.
# =============================================================================
# LOAD CLEANED DATA
# =============================================================================
# Load the CSV file you manually cleaned in Google Sheets
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number
# Replace "UNC_processed_clean.csv" with your institution's cleaned file
df_institution_cleaned = pd.read_csv(
"group_data_packets/group_5/python/VirginiaTech_processed_clean.csv"
)
# =============================================================================
# APPLY DATA CLEANING FUNCTIONS
# =============================================================================
# Use our custom function to standardize the cleaned data
# Apply the cleaning function to standardize data types and handle missing values
# This function ensures all datasets have the same format for consistent analysis
df_institution_cleaned = clean_institution_dataframe(df_institution_cleaned)
# Display first few rows to verify the cleaning worked properly
# This shows the structure and sample content of your cleaned data
df_institution_cleaned.head()
DataFrame cleaned successfully!
| school_name | unique_id | date | sentences | roberta_compound | place | latitude | longitude | revised_place | revised_latitude | revised_longitude | place_type | false_positive | checked_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | VIRGINIATECH | VIRGINIATECH_1589 | 2020-09-05 13:53:29 | To the blonde driver of a sky blue Volkswagen ... | 0.134345 | 360 Main St. | 49.89476 | -97.13848 | S Main Street | 37.212438 | -80.399962 | Road | <NA> | Brea |
| 1 | VIRGINIATECH | VIRGINIATECH_520 | 2024-02-19 02:19:32 | One comment on facebook (not the post you link... | -0.298099 | Abingdon | 51.67109 | -1.28278 | Abingdon | 36.700213 | -81.896434 | City | <NA> | Brea |
| 2 | VIRGINIATECH | VIRGINIATECH_2020 | 2024-03-17 09:39:56 | Did John Roop ever come back to Abingdon or VT... | -0.007829 | Abingdon | 36.70983 | -81.97735 | Abingdon | 36.700213 | -81.896434 | City | <NA> | Brea |
| 3 | VIRGINIATECH | VIRGINIATECH_2021 | 2024-03-17 09:41:38 | Is he back in Abingdon or at VT now? | -0.000851 | Abingdon | 36.70983 | -81.97735 | Abingdon | 36.700213 | -81.896434 | City | <NA> | Brea |
| 4 | VIRGINIATECH | VIRGINIATECH_2023 | 2024-02-22 21:24:59 | .except that his plan was to go home to Abingd... | -0.025051 | Abingdon | 51.67109 | -1.28278 | Abingdon | 36.700213 | -81.896434 | City | <NA> | Brea |
3.4 Map Customization¶
# =============================================================================
# CREATE CLEANED LOCATION MAP (After Manual Corrections)
# =============================================================================
# This map shows your data AFTER you fixed the location errors
# Compare this to the raw map above to see the improvement!
# STEP 1: Count occurrences using CLEANED/CORRECTED location data
# Now we use 'revised_place' instead of 'place' - these are your corrections!
place_counts = (
df_institution_cleaned.groupby("revised_place") # Group by corrected place names
.agg(
{
"revised_place": "count", # Count mentions of each corrected place
"revised_latitude": "first", # Use corrected latitude coordinates
"revised_longitude": "first", # Use corrected longitude coordinates
"place_type": "first", # Keep place type classification
}
)
.rename(columns={"revised_place": "count"}) # Rename count column for clarity
)
# STEP 2: Prepare data for mapping
place_counts = place_counts.reset_index() # Make 'revised_place' a regular column
# Remove places without valid corrected coordinates
place_counts = place_counts.dropna(subset=["revised_latitude", "revised_longitude"])
# STEP 3: Create the cleaned location map
fig = px.scatter_map(
place_counts,
lat="revised_latitude", # Use corrected Y-coordinates
lon="revised_longitude", # Use corrected X-coordinates
size="count", # Dot size = mention frequency
hover_name="revised_place", # Show corrected place name on hover
hover_data={
"count": True, # Show how many mentions
"place_type": True, # Show place category
"revised_latitude": ":.4f", # Show corrected coordinates
"revised_longitude": ":.4f",
},
size_max=25, # Maximum dot size
title="Cleaned Location Data: Places Mentioned in Virginia Tech Reddit Posts",
zoom=4, # 📝 TO DO: Adjust zoom level for your region
center=dict(lat=37.228834, lon=-80.42351), # 📝 TO DO: Adjust center coordinates
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-darkmatter", # Clean, readable map style
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
Revision Insights¶
While we could expect there to be a lot of Virginia locations mentioned in the Virginia Tech corpus, as Virginia Tech is located in Virginia, one thing to note is that other popular locations were areas where the Virginia Tech student body is from. This includes Northern Virginia, Richmond, Maryland, New Jersey, New York, and Pennsylvania. This pattern can also be found in the JMU corpus, showing a similarity in student body demographics.
Part 4: Spatial Comparison¶
In this section you are going to compare the spatial distribution of JMU and your institution. You will create two maps using the custom create_locations_counts() function.
🔧 Function Parameters¶
This function has two key parameters:
minimum_count=- Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default2means that if a location appears once it will not registerplace_type_filter=- This uses theplace_typecolumn to filter out only the types of places you want. The default isNone, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places. In the sample below, only "States", "City", and "Country" places will show up.
⚠️ NOTE: This does mean you would have had to tag place types properly in the cleanup process
💡 Example Usage¶
create_location_counts(
df_institution_cleaned,
minimum_count=2,
place_type_filter=["State", "City", "Country"]
)
🔧 Customization Instructions¶
⚠️ Important: Make sure that minimum_count and place_type_filter for create_locations_counts are the same for both maps.
Visual Customization:
- Tweak the center and zoom of your map to highlight an important contrast
- Set the
color_discrete_sequence=px.colors.qualitative.Plotlyto something of your choice
⚠️ Note: Delete this cell for the final version
4.1 JMU Spatial Distribution¶
# =============================================================================
# JMU SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a filtered map showing only certain types of places for JMU
# STEP 1: Use custom function to filter and count JMU locations
# This function applies the same filtering to both datasets for fair comparison
JMU_filtered_locations = create_location_counts(
df_jmu, # JMU Reddit data
minimum_count=2, # Only show places mentioned 2+ times
place_type_filter=['City', 'Building', 'Road'] # Only these place types
)
# STEP 2: Create colored scatter map
# Each place type gets a different color to show spatial patterns
fig = px.scatter_map(
JMU_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Dot size = mention frequency
color="place_type", # Different colors for different place types
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=13, # 📝 TO DO: Adjust to highlight interesting patterns
title="Cleaned Location Data: Places Mentioned in JMU Reddit Posts",
center=dict(lat=38.44582, lon=-78.87058), # 📝 TO DO: Center on area of interest
color_discrete_sequence=px.colors.qualitative.Plotly # Categorical color palette
)
# STEP 3: Customize layout
fig.update_layout(
map_style="carto-darkmatter",
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a comparable map for your institution using identical filtering
# STEP 1: Apply the same filtering to your institution's data
# Using identical parameters ensures fair comparison with JMU
institution_filtered_locations = create_location_counts(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU map
place_type_filter=["City", "Building", "Road"] # Same place types as JMU
)
# STEP 2: Create matching visualization
# Keep all settings the same as JMU map for direct comparison
fig_institution_cleaned = px.scatter_map(
institution_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="place_type", # Same color coding as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=12, # 📝 TO DO: Adjust for your region
title="Cleaned Location Data: Places Mentioned in Virginia Tech Reddit Posts", # 📝 TO DO: Update institution name
center=dict(lat=37.23040, lon=-80.41500), # 📝 TO DO: Center on your region
color_discrete_sequence=px.colors.qualitative.Plotly, # Same colors as JMU
)
# STEP 3: Apply identical layout settings
fig_institution_cleaned.update_layout(
map_style="carto-darkmatter", # Same style as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_cleaned.show(config={'displayModeBar': True, 'displaylogo': False})
4.2 Spatial Analysis¶
✍️ Writing Task¶
Write a paragraph on an important spatial difference or similarity between the two datasets that confirms or complicates your hypothesis.
💡 Example: "While we theorized that UNC would have a significant number of posts about the Southeast, the mapping data does not reveal this. Instead, the Reddit feed rarely speaks about states outside of North Carolina, and when it does it is about institutions out west."
Part 5: Sentiment Analysis Comparison¶
# =============================================================================
# JMU SENTIMENT ANALYSIS MAP
# =============================================================================
# Shows the EMOTIONAL tone of how JMU students talk about different places
# Red = negative emotions, Green = positive emotions
# STEP 1: Calculate average sentiment scores by location
# This function groups identical locations and averages their sentiment scores
df_jmu_sentiment = create_location_sentiment(
df_jmu, # JMU Reddit data with sentiment scores
minimum_count=2, # Only places mentioned 2+ times (for reliability)
place_type_filter=None # Include all place types for comprehensive view
)
# STEP 2: Create sentiment visualization map
# Color represents emotional tone: Green = positive, Red = negative, Yellow = neutral
fig_sentiment = px.scatter_map(
df_jmu_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Larger dots = more mentions (more reliable sentiment)
color="avg_sentiment", # Color intensity = emotional tone
color_continuous_scale="RdYlGn", # Red-Yellow-Green scale (Red=negative, Green=positive)
hover_name="revised_place",
hover_data={
"count": True, # How many posts contributed to this sentiment
"avg_sentiment": ":.3f", # Average sentiment score (3 decimal places)
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=12, # 📝 TO DO: Adjust to focus on interesting patterns
title="Average Sentiment by Location in JMU Reddit Posts",
center=dict(lat=38.44602, lon=-78.86983), # 📝 TO DO: Center on region of interest
)
# STEP 3: Customize layout for sentiment analysis
fig_sentiment.update_layout(
map_style="carto-darkmatter", # Clean background to highlight sentiment colors
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SENTIMENT ANALYSIS MAP
# =============================================================================
# Compare emotional patterns between your institution and JMU
# STEP 1: Calculate sentiment for your institution using identical methods
institution_sentiment = create_location_sentiment(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU (ensures fair comparison)
place_type_filter=None # Same filter as JMU (include all place types)
)
# STEP 2: Create matching sentiment visualization
# Use identical settings to JMU map for direct comparison
fig_institution_sentiment = px.scatter_map(
institution_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="avg_sentiment",
color_continuous_scale="RdYlGn", # Same color scale as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"avg_sentiment": ":.3f",
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=13, # 📝 TO DO: Adjust for your region
title="Average Sentiment by Location in Virginia Tech Reddit Posts", # 📝 TO DO: Update institution name
center=dict(lat=37.22986, lon=-80.41380), # 📝 TO DO: Center on your institution's region
)
# STEP 3: Apply identical layout for comparison
fig_institution_sentiment.update_layout(
map_style="carto-darkmatter", # Same background as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
Sentiment Comparison Analysis¶
The two sentiment maps support our hypothesis that Virginia Tech students have a more positive relationship with Blacksburg than JMU students have with Harrisonburg. On the Virginia Tech sentiment map, larger and greener circles were found in Blacksburg instead of the campus itself. The positive circles are clustered around town spaces like bars and businesses. This shows that VT students mention local places often and have positive sentiments towards them. In contrast, the Harrisonburg map shows smaller and more neutral/negative circles, and most of them are JMU specific locations. The sentiments that mention Harrisonburg were found to contain more negative sentiments from JMU students. This map comparison suggests that VT students engage more with their town community, while JMU students are more centered on campus.
Part 6: Time Series Animation Analysis¶
# =============================================================================
# ANIMATED TIME SERIES: SENTIMENT CHANGES OVER TIME
# =============================================================================
# Watch how places accumulate mentions and sentiment changes over time
# This reveals temporal patterns in student discussions
# STEP 1: Prepare animation data with rolling averages
# This function creates monthly frames showing cumulative growth and sentiment trends
institution_animation = create_time_animation_data(
df_institution_cleaned, # Your cleaned institution data
window_months=3, # 3-month rolling average (smooths out noise)
minimum_count=2, # Only places with 2+ total mentions
place_type_filter=None # Include all place types (📝 TO DO: experiment with filtering)
)
# STEP 2: Create animated scatter map
# Each frame represents one month, showing cumulative mentions and current sentiment
fig_animated = px.scatter_map(
institution_animation,
lat="revised_latitude",
lon="revised_longitude",
size="cumulative_count", # Dot size = total mentions up to this point in time
color="rolling_avg_sentiment", # Color = 3-month average sentiment (smoother than daily)
animation_frame="month", # Each frame = one month of data
animation_group="revised_place", # Keep same places connected across frames
hover_name="revised_place",
hover_data={
"cumulative_count": True, # Total mentions so far
"rolling_avg_sentiment": ":.3f", # Smoothed sentiment score
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f"
},
color_continuous_scale="RdYlGn", # Same sentiment colors as static maps
size_max=30, # Slightly larger max size for animation visibility
zoom=12, # 📝 TO DO: Adjust zoom for your region
title="Institution Reddit Posts: Cumulative Location Mentions & Rolling Average Sentiment Over Time",
center=dict(lat=37.226361, lon=-80.410084), # 📝 TO DO: Center on your institution's area
range_color=[-0.5, 0.5] # Fixed color range for consistent comparison across time
)
# STEP 3: Customize animation settings and layout
fig_animated.update_layout(
map_style="carto-darkmatter",
width=800,
height=600,
title_font_size=16,
title_x=0.5,
coloraxis_colorbar=dict( # Customize the sentiment legend
title="Rolling Avg<br>Sentiment",
tickmode="linear",
tick0=-0.5, # Start legend at -0.5 (most negative)
dtick=0.25 # Tick marks every 0.25 points
)
)
# STEP 4: Set animation timing (in milliseconds)
# 📝 TO DO: Experiment with these values for optimal viewing
fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800 # Time between frames
fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300 # Transition smoothness
# Display with HTML export configuration
fig_animated.show(config={'displayModeBar': True, 'displaylogo': False})
Time Series Analysis¶
The sentiment around Blacksburg’s gets larger and rotates between being positive, negative, and neutral while ultimately ending on positive. Additionally, the sentiment around VT as a university becomes larger at a slower rate and results in a negative sentiment by the end of the time frame. This confirms our hypothesis because the sentiment around Blacksburg has a higher positive emotion than the James Madison has around the community of Harrisonburg. We also analyzed how Blacksburg turned negative around February and March 2023 and that is most likely due to a reddit post that stated Virginia Tech Board of Visitors will be raising tuition prices by up to 8 percent at that time. There is also a spike in positive emotion on November 25, 2023, most likely due to when Virginia Tech beat UVA in football. By analyzing emotions based on the time series of emotions, we can see how certain events impact student opinion of their campus and surrounding community.
Part 7: Conclusion and Future Research¶
Our findings helped support our hypothesis that Virginia Tech students are more community-focused than JMU students. This was confirmed by the Virginia Tech corpus, putting a greater emphasis on Blacksburg when talking about the school, community, and experiences, while JMU students focused on JMU as a school when talking about similar concepts. One shortcoming of our research is centered on sentiment over time. We thought that Virginia Tech students had a better relationship with their community, but this map showed the seesaw of positive and negative sentiments over the years. Another issue with sentiment overall is that the corpus analyzed data from the years that COVID affected college campuses, which resulted in a lot of negative sentiment regarding both schools, even though it was a global issue. One improvement that could be made is based on that very limitation, to remove the COVID-impacted years from the corpus. This improvement, however, raises the question of whether it would accurately represent the corpus by cleaning the data in that way. Another improvement to avoid this could be to focus on more recent years. Our findings about spatial sentiment show that opinions about a location can vary between years and events that are happening. It also suggests that different locations foster different cultures with the different locations within a broader location. Specifically, when looking into college campuses, it was interesting to see how culture varies with the preexisting community.